-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tech demo: switch from FFTW to pocketfft/ducc.fft #287
Conversation
I think the CMake build should almost work now ... I just can't convince the Cuda compiler to use the C++17 standard. Giving up on this for now. |
I just noticed that you have actually been discussing this topic in |
This was actually part of the inspiration for my Anyway, this looks great. I'm sure we'll talk more about it soon. |
Actually, the FFT part of |
BTW, implementing #304 on this branch should be fairly simple, if you can tell me which parts of |
Let's do the 2D case, for upsampfac=2. Then the four corners of the 2D FFT array are the ones that copy to the user's uniform I/O data. (output for type1, input for type2).
get stacked (also applying "deconvolve" correction) to the U array [2,1;4,3] in matlab notation. 3D is analogous. The best way would be to use the indices I coded into It would be nice to implement this idea using fftw interface too, so we can test fftw and mkl. |
Thanks, that's what I arrived at as well! (There might still be off-by-one errors in some cases, but the test suite works ...
It's not hard, but unfortunately it will be quite verbose ... |
Hi Martin, Robert, Libin & co, Before you spend too much effort on this (and I notice you removed FFTW from all the make/cmake/CI files in your PR, which is quite a lot of work) we should strategize, since we would not be able to simply bring in your PR. The main reason is we don't want to stop having FFTW as an option. There are two main improvements on the table for the CPU code (for now), which are orthogonal to each other:
Sorry, I have to go for now, but any thoughts about this? Best, Alex |
Dear Alex, just one thing: please don't worry that I'm spending too much time on this and would be unhappy if it isn't merged! I have been doing this strictly for fun up to now, and I admit I was perhaps a bit over-eager when I started stripping the "-lfftw3 -lfftw3_omp" commands fron the demos etc :-) Cheers, |
2e637b9
to
0e5f3f3
Compare
Hi @mreineck, We are thinking of providing a switchable fft interface. At least, I would like to provide the option to the users to test both. Second, ducc license is permissive no? It might be worth spending a bit of time manually vectorizing the bottlenecks to fill the gap with fftw? What do you think? depending on the size of the task I might be able to help a bit. Cheers, |
If you like, I can bring this branch up to date; shouldn't be too much work!
The full ducc package is released under GPLv2+, which isn't considered permissive any more by most people. But the FFT part (and all ducc source code it depends on) is also available under BSD3, which should be fine. Still, I recommend very thorough benchmarking before you decide to tweak the ducc FFT code any further. Most of the advantage that FFTW and MKL FFT have over ducc FFT comes (I'm pretty sure) from special passes for higher powers of 2 (length 16, 32, 64) that are not in ducc simply because their source code is huge. Vectorization should be pretty good overall, especially in multi-D transforms. My personal gut feeling is that the current implementation strikes a fairly good balance between performance and maintainability. |
If you can bring this up to date it would be great. I personally have two requirements:
I have not looked at the code for the powers of 2, it might be possible to generate them at compile time using templates instead of hardcoding no? It might require c++17 but to mantain ducc c++11 compatible these can be enabled only if c++17 is supported by the compiler. |
If you help me with the automatic installation of the ducc sources, then I'm pretty sure I can make this work. I'll probably start work on a new branch though, otherwise the diffs become too large. Ducc requires C++17 already, so no special measures are needed if you want to use it in your FFT experiments. |
Sure. I will start a new branch for this and I will change cmake so that it pulls the sources. |
Perfect, thanks a lot! |
Hi Martin, I made the following fork: https://github.com/DiamonDinoia/finufft/tree/switchable-fft The way this works: here mkdir build && cd build
cmake ../ -DFINUFFT_BUILD_EXAMPLES:BOOL=ON \
-DFINUFFT_BUILD_TESTS:BOOL=ON \
-DFINUFFT_ENABLE_SANITIZERS:BOOL=ON \
-DFINUFFT_USE_OPENMP:BOOL=ON \
-DFINUFFT_USE_DUCC0:BOOL=ON \
-DCMAKE_BUILD_TYPE=Release I sometimes do this will automatically fetch ducc0 and create the define I would keep an eye out as we are about to merge the new vectorized spreader, it should not affect this as the changes will be in separate files. The way I envision it is to write a wrapper for the various fft calls: fft_makeplan
fft_execute and inside with a define we switch between the various immplementation |
Thanks for the instructions; I'm not really a cmake person, so they are really helpful! Not urgent, but I'm interested for the future: how do I enable generating the Python bindings with Also I'm currently running into a small problem when compiling:
This looks like a |
Nevermind :-) I found the answer to the Python installation in the docs! The compilation problem goes away if I comment out the line |
I'm encoutering a small I'll probably just need |
Hi Martin, I pushed the changes you requested. More files can be added in Cheers, |
I think we can close this now; it is superseded by #463 |
I'm not sure if this is of much general interest, but here is a small experiment that switches all of
finufft
's FFTs from FFTW to ducc.fft (formerly known aspocketfft
and used byscipy
).Advantages:
finufft
(if so desired); this should make configuration and compilation easier.FFTW_ESTIMATE
was used before; forFFTW_MEASURE
, FFTW may still win in most cases, but differences should be fairly small.The changes to use ducc.fft are minimal (just a few lines inside
finufft.cpp
and small adjustments to themakefile
)In its current state, the PR is just meant as a demo. It does not remove any FFTW-related code except the actual
fftw_execute
call, which means that all FFTW planning etc. is still done (uselessly). Also, I only adjusted themakefile
, since I'm not at all familiar withcmake
.If this looks interesting to you, please give it a try, run a few benchmarks and let me know if you have any questions!